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Abstract 

This paper presents a new approach to conditional inference, based 
on the simulation of samples conditioned by a statistics of the data. Also 
an explicit expression for the approximation of the conditional likelihood 
of long runs of the sample given the observed statistics is provided. It is 
shown that when the conditioning statistics is sufficient for a given pa- 
rameter, the approximating density is still invariant with respect to the 
parameter. A new Rao-Blackwellisation procedure is proposed and simu- 
lation shows that Lehmann Scheffe Theorem is valid for this approxima- 
tion. Conditional inference for exponential families with nuisance param- 
eter is also studied, leading to Monte carlo tests. Finally the estimation 
of the parameter of interest through conditional likelihood is considered. 
Comparison with the parametric bootstrap method is discussed. 

Keywords: Conditional inference, Rao Blackwell Theorem, Lehmann 
Scheffe Theorem, Exponential families, Nuisance parameter, Simulation. 

1 Introduction and context 

This paper explores conditional inference in parametric models. A comprehen- 
sive overview on this area is the illuminating review paper by Reid (1995) [27] . 
Our starting point is as follows: given a model V defined as a collection of con- 
tinuous distributions Pg on M'' , with density pg where the parameter 9 belongs 
to some subset O in Mf and given a sample of independent copies of a random 
variable with distribution P^t f^i' some unknown value 6t of the parameter, we 
intend to provide some inference about 6t conditioning on some statistics of the 
data. The situations which we have in mind are of two different kinds. 

The first one is the Rao-Blackwellisation of estimators, which amounts to 
reduce the variance of an unbiased estimator by conditioning on any statistics; 
it is a known fact that such method reduces its variance; when the conditioning 
statistics is complete and sufficient for the parameter then this procedure pro- 
vides optimal reduction, as stated by Lehmann- Scheffe Theorem. This realm of 
questions is the motivation for the first part of this paper: 

1. is it possible to provide good approximations for the density of a sample 
conditioned on a given statistics, and, when applied for a model where 



some sufficient statistics for the parameter is known, does sufficiency w.r.t. 
the parameter still holds for the approximating density? 

2. in the case when the first question has positive answer, is it possible to 
simulate samples according to the approximating density, and to propose 
some Rao-Blackwellised version for a given preliminary estimator? Also 
we would hope that the proposed method would be feasible, that the 
programming burden would be light, that the run time for this method be 
short, and that the involved techniques would keep in the range of globally 
known ones by the community of statisticians. 

The second application of conditional inference pertains to the role of con- 
ditioning in models with nuisance parameters. There is a huge bibliography on 
this topic, some of which will be considered in details in the sequel. The usual 
frame for this field of problems is the exponential families one, for reasons re- 
lated both with the importance of these models in applications and on the role 
of the concept of sufficiency when dealing with the notion of nuisance parame- 
ter. Conditioning on a sufficient statistics for the nuisance parameter produces a 
new exponential family, which gets free of this parameter, and allows for simple 
inference on the parameter of interest, at least in simple cases. This will also 
be discussed, since the reality, as known, is not that simple, and since so many 
complementary approaches have been developped over decades in this area. Us- 
ing the approximation of the conditional density in this context and performing 
simulations yields Monte Carlo tests for the parameter of interest, free from the 
nuisance parameter. Also conditional maximum likelihood estimators will be 
produced. Comparison with the parametric bootstrap will also be discussed. 

This paper is organized as follows. Section 2 describes a general approxima- 
tion scheme for the conditional density of long runs of subsamples conditioned 
on a statistics, with explicit formulas. The (rather lengthy) proof of the main 
result of this section is presented in Broniatowski and Caron(2011) 4 . Discus- 
sion about implementation is provided. Section 3 presents two aspects of the 
approximating conditional scheme: we first show on examples that sufficiency 
is kept under the approximating scheme and, second, that this yields to an 
easy Rao-Blackwellisation procedure. An illustration of Lehmann-Scheffe The- 
orem is presented. Section 4 deals with models with nuisance parameters in the 
context of exponential families. We have found it useful to spend a few para- 
graphs on bibliographical issues. We address Monte Carlo tests based on the 
simulation scheme; in simple cases its performance is similar to that of paramet- 
ric bootstrap; however conditional simulation based tests improve clearly over 
parametric bootstrap procedure when the test pertains to models for which the 
likelihood is multimodal with respect to the nuisance parameter; an example 
is provided. Finally we consider conditioned maximum likelihood based on the 
approximation of the conditional density; in simple cases its performance is sim- 
ilar to that of unconditional likelihood; however when the preliminary estimator 
of the nuisance is difficult to obtain, for example when it depends strongly on 
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some initial point for a Newton- Raphson routine (this is indeed a very common 
situation), then, by the very nature of sufficiency, conditional inference based 
on the proxy of the conditional likelihood performs better; this is illustrated 
with examples. 

2 The approximate conditional density of the 
sample 

Most attempts which have been proposed for the approximation of conditional 
densities stem from arguments developped in Lehmann (1986) [TB] for inference 
on the parameter of interest in models with nuisance parameter; however the 
proposals in this direction hinge at the approximation of the distribution of the 
sufficient statistics for the parameter of interest given the observed value of the 
sufficient statistics of the nuisance parameter. We will present some of these 
proposals in the section devoted to exponential families. To our knowledge, no 
attempt has been made to approximate the conditional distribution of a sample 
(or of a long subsample) given some observed statistics. 

However, generating samples from the conditional distribution itself (such 
samples are often called co-sufficient samples, following Lockhart et al.(2007) 
[20] ) has been considered by many authors; see for example Engen and Lillegard 
(1997) [12], Lindqvist et al. (2003) [T^ and references therein, and Lindqvist and 
Taraldsen (2005)^181. 

In Engen and Lillegard (1997) [12], simulating exponential or normal samples 
under the given value of the empirical mean is proposed. For example under 
the exponential distribution Exp{9), the minimal sufficient statistics for 9 is 
the sum of the observations, say t„; a co-sufficient sample x* can be created by 
generating an x -sample from Exp{l) and taking x* — x^tn/x . However, this 
approach may be at odd in simple cases, as for the Gamma density in the non 
exponential case. 

Lockhart et al. (2007) [2^ proposed a different framework based on the Gibbs 
sampler, simulating the conditioned sample one at a time through a sequential 
procedure. The example which is presented is for the Gamma distribution under 
the empirical mean, but it seems to perform well, for location parameter, when 
the true parameter is in some range, therefore not uniformly on the model. Their 
paper contains a comparative study with the parametric bootstrap procedure 
(introduced by Efron fl979)|ll]) for similar problems. In a simple case, they 
argue favorably for both methods. We will turn back to parametric bootstrap in 
relation with conditional likelihood estimators, in the last section of this paper. 

Other techniques have been developped in specific cases: for the inverse 
gaussian distribution see O'Reilly and Gravia-Medrano (2006) [22], Cheng (1984) 
[8]; for the Weibull distribution see Lockhart et Stephens (1994) [21]. No unified 
technique exists in the litterature which would work under general models. 
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2.1 Approximation of conditional densities 



2.1.1 Notation and hypotheses 

For sake of clearness we consider the case when the model "P is a family of 
distributions on R. Extension to K'^,d > 1 can be achieved in the same way, 
using similar results developed in futur work. 

Denote X" := (Xi, ..,X„) a set of n independent copies of a real random 
variable X with density px.St ^- Let x" := (xi, ...,x„) denote the observed 
values of the data, each x^ resulting from the sampling of X^. Define the r.v. 
U := M (X) and Ui^„ :— u (Xi) + ... + 7i (X„) where m is a real- valued measurable 
function on R, and, accordingly, ui_„ :— u (xi) + ...+« (x„) . Denote pv^Ot the 
density of the r.v. U. We consider approximations of the density of the vector 
li-i — (Xi, .., Xj.) on R when Ui^„ — wi.n- It will be assumed that the observed 
value ui^n is "typical", in the sense that it keeps in the range of the iterated 
logarithm law order of magnitude (for large n) . Large deviation cases could also 
be handled, but conditional inference is based on the implicit assumption that 
such cases are excluded from the analysis. We hence assume 

hm sup ■ = 1. (LIL) 

n^oo ^yVar{u {lC))^/2rr\og\ogn 

We propose an approximation for 

Pui.^fiT (a^l) := PeT(2;i|Ul,„ =Ui^n) (1) 

where Xj;' := (Xi, .., X^.) and fc := fc„ is an integer sequence such that 

< lim sup k/n < 1 (Kl) 



together with 



lim n k — oo (K2) 



which is to say that we approximate ^^Ot {^i) O'^ long runs. The rule which 
define the value of k for a given accuracy of the approximation is stated in 
section 3.2 of Broniatowski and Caron(2011) [1]. 

The hypotheses pertaining to the function u and the r.v. U = w (X) are as 
follows 



1. M is real valued and the characteristic function of the random variable U 
is assumed to belong to L"^ for some r > 1. 

2. The r.v. U is supposed to fulfill the Cramer condition: its moment gen- 
erating function satisfies 

4'u{t) EexptV < oo 
for t in a non void neighborhood of 0. 
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Define the functions m{t),s'^{t) and /i3(i) as the first, second and third 
derivatives of log (pu (t) . Denote 

^ exptu{x) 

with m(t) = a and a belongs to the support of Px,9t^ the distribution of X. 
The density tt" is the tilted density with parameter a. Also it is assumed that 
this latest definition of t makes sense for all a in the support of X. Conditions 
on 0u(i) which ensure this fact are referred to as steepness properties, and are 
exposed in Barndorff-Nielsen(1978) 1 , pl53. 

We introduce a positive sequence e„ which satisfies 



lim EriV n — k = oo (El) 



lim e„(log7i)^ =0. (E2) 

2.2 The proxy of the conditional density of the sample 

We recusively define the density {x\) on R'^, which approximates {xi) 

sharply with relative error smaller than e„ (log n)'^ . The subsript 9t will be 
omitted when there is no ambiguity about the value of the parameter. 
Set 

Wo := ui^n/n. 

and 

5o(a;i|xo) := TrTi^i) 

with xq arbitrary, and for 1 < i < fc — 1 define the density g{xi+i \ x\) recursively 
as follows. 

Set ti the unique solution of the equation 

rui := m{ti) = '^^^ — — !^ (2) 

n — I 

where ui^i :— u(xi) + ... + u{xi). The tilted adaptive family of densities tt^* is 
the basic ingredient of the derivation of approximating scheme. Let 

s^ ■.^^{logE^'^.exptu (X)) (0) 

and 

— (log exptu(X)) (0) , j = 3,4 
which are the second, third and fourth cumulants of vr™* . Let 

g{xi+i\x\) = Cipyi^0^{xi+i)n(al3,l3,u{xi+i)) (3) 
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where n (/i, r, x) is the normal density with mean fi and variance r at x. Here 

(3^sUn~t~l) (4) 

and the is a normahzing constant. 
Define 

fc-i 

5ui,„(a;J) :=go(a;i|a;o)]^5(xi+i|a;l). (6) 

i=l 

It holds 

Theorem 1 Assume iKl\KS\) together with ltEllES\) . Then (i) 
and (a) 



Pu,,Axi) = 5«i,„(a;J)(l + op„^ ^ (e„ (logn)^)) 



Pni,„ (a;?) = 5«i,„(a;f )(1 + og„^ „ (e„ (logJi)^)). 
For the proof, see Broniatowski and Caron (2011) [3]. 

Statement (i) means that the conditional likelihood of any long sample path 
given Ui^„ = ui_„ can be approximated by with a small relative 

error on typical realizations of X" . 

The second statement states that simulating Xj under ^ produces runs 
which could have been sampled under the conditional density p^^ ^ since gu^ „ 
and „ coincide sharply on larger and larger subsets of M.'^ as n increases. 

Remark 2 Theorem[J\ states that the density g^^^ „,(6t,»?t) ''^ approximates 
Pui „,{eT,riT) '"^ sample x" generated under (9t, rjr) ■ However, in some cases, 
the r.v.'s Xj 's in Theorem\^ may at time he generated under some other pa- 
rameters, say {OQ,rio) . Indeed, for direct applications developped in this paper, 
Theorem [7] have to hold when the sample is generated under an other sampling 
scheme. Broniatowski and Caron (2011) JJf state in Theorem 11 that the ap- 
proximation scheme holds true in this case. 

Let Y^i be i.i.d. copies o/ Z with distribution Q and density q; assume that 
Q satisfies the Cramer condition J (exptx) q{x)dx < oo for t in a non void 
neighborhood of 0. Let Vi_„ := u (Yi) + ... + u (Y„) and define 

9«i,„ iVi) q ( I Vi ,„ = Ui,„) 

with distribution Quj „ . It then holds 

Theorem 3 Then, with the same hypotheses and notation as in Theorem]^ 

p (Xf = |Ui,„ - ui,„) = gu,AY^){l + OQ^^ ^ (e„ (logn)')). 

Also the total variation distance between Q^i „ and P^^ ^ goes to as n tends 
to infinity. 
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2.3 Comments on implementation 

The simulation of a sample with density „ is fast as easy. Indeed the 
r.v. with density g (^Xi+i\x\^ is obtained through a standard acceptance - 

rejection algorithm. When 9t is unknown, a preliminary estimator may be used. 
When Ui,„ is sufficient for „ it is nearly sufficient for its proxy „ (see next 
section) ; indeed changing the value of this preliminary estimator does not alter 
the likelihood of the sample; as shown in the simulations developped here after, 
any value of can be used; call 9* the 9 chosen as initial value , using henceforth 
Px,e* instead of px.fij, in ([31). In exponential families the values of the param- 
eters which appear in the gaussian component of g [xi+i\x\) in are easily 
calculated; note also that due to (jLILp the parameters in n {aP,(3, u {xi+i)) are 
such that the dominating density can be chosen for all i as px,e* ■ The constant 
in the acceptance rejection algorithm is then 1 / \/2na. This is in contrast with 
the case when the conditioning value is in the range of a large deviation with 
respect to px.e^,; in this case, which appears in a natural way in Importance 
sampling estimation for rare event probabilities, the simulation algorithm is 
more complex ; see [5|. 



3 Sufficient statistics and approximated condi- 
tional density 

3.1 Keeping sufficiency under the proxy density 

The density gm niUi) is used in order to handle Rao -Blackellisation of esti- 
mators or statistical inference for models with nuisance parameters. The basic 
property is sufficiency with respect to the envolvcd parameter. We show on 
some examples that g^ „(j/i ) defined in ([5]) inherits of the invariance with re- 
spect to a parameter when conditioning on a sufficient statistics pertaining to 
this parameter. 

Consider the Gamma density 

fp,e{x) := -^^^^^ exp-x/9 for a; > 0. (7) 

As r varies in and 9 is positive, the density runs in an exponential family 
7r,e with parameters r := p — 1 and 9, and sufficient statistics t{x) := logx and 
u{x) := X respectively for r and 9. Given an i.i.d. sample X" := {Xi, ...,X„) 
with density frx.OT resulting sufficient statistics are respectively Ti „ := 
\ogXi -I- ... -I- \ogXn and Ui^n := -I- ... -I- X„. We consider two parametic 
models {'~frT,9T^ ^ 0) and {'^rfiT : > 0) respectively assuming tt or 0t known. 

We first consider sufficiency of C/i,„ in the first model. The density gu^ ^ (yf) 
should be free of the current value of the true parameter 9t of the parameter 
under which the data are drawn. However as appears in ^ the unknown value 
9t should be used in its very definition. We show by simulation that whatever 
the value of 9 inserted in place of 9t in ^ the likelihood of X^ under g^ „ 
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Figure 1: Proxy of the conditional likelihood of under „ as a function of 
6 for n = 100 and fc = 80 in the gamma case. 



Figure 2: Proxy of the conditional likelihood of under gjj-^ „ as a function of 
r for n = 100 and fc = 80 in the gamma case. 



does not depend upon 0. We thus observe that Ui,n is sufficient for 0t in the 
conditional density approximating p^^ ^ as should hold as a consequence of 
Theorem [T] . 

Similarly the same fact occurs replacing 6t by in the model {'jr,eTJ r > 0) . 
In both cases whatever the value of the parameter 6 (Figure [ij or r (Figure 
[2]), the likelihood of X^ remains constant. 

We also consider the Inverse Gaussian distribution with density 



' A 
2^ 



-,1/2 



exp ^ 



for a; > 



(8) 



with both parameters A and /i be positive. Given an i.i.d. sample X" := 
{Xi, ...,Xn) with density /^,a, the resulting sufficient statistics are respectively 



■ Xn and Ui,: 



x: 



■ X„ . Similarly as for the Gamma 



case we draw the likelihood of a subsample X^ under g^^ , 
... + Xn , which is a sufficient statistics for /i (Figure [3]), 



with Ti^n '■— Xi 
and upon Ui^n ■ 



0.6 0.8 1.0 1.2 



Figure 3: Conditional likelihood of under „ as a function of ^ for n = 100 
and fc = 80 in the Inverse Gaussian case. 



Figure 4: Conditional likelihood of under gu^ „ as a function of A for n = 100 
and fc = 80 in the Inverse Gaussian case. 



X^^ + ... + X~^ which is sufficient for A (Figure |4]). In either cases the other 
coefficient is kept fixed at the true value of the parameter generating the sample. 
As for the Gamma case these curves show the invariance of the proxy of the 
conditional density with respect to the parameter for which the chosen statistics 
is sufficient. 



3.2 Rao-Blackwellisation 

Rao-Blackwell Theorem holds regardless of whether biased or unbiased esti- 
mators are used, since it reduces the MSE. Although its statement is rather 
weak, in practice, however, the improvement is often enormous. New interest in 
Rao-Blackwellisation procedures have risen in the recent years, conditioning on 
ancillary variables (see Fraser(2004) [13] for a survey on ancillaries in conditional 
inference); specific Rao-Blackwellisation schemes have been proposed by Casella 
and Robert [6], [7], Perron(1999)(2|, Done and Robert (2010) [28 and lacobucci 
et all. (2010) |T3]. The purpose is to improve the variance of a given statistics (for 
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example a tail probability) under a known distribution, through a simulation 
scheme under this distribution; the ancillary variables used in the simulation 
process itself are used as conditioning ones for the Rao-Blackwellisation of the 
statistics. The present approach is more classical in this respect, since we do not 
assume that the parent distribution is known; conditioning on a sufficient statis- 
tics Ui^n with respect to the parameter 6 and simulating samples according to 
the approximating density gui^„ will produce the improved estimator. 

Since Ui^n is sufficient for the parameter 9 in ^ it can be used in order 
to obtain improved estimators of 9t through Rao Blackwellization. We shortly 
illustrate the procedure and its results on some toy cases. Consider again the 
Gamma family defined here-above with canonical parameters r and 9. 

First the parameter to be estimated is 9t- A first unbiaised estimator is 
chosen as 

Given an i.i.d. sample X" with density ^tt.Ot the Rao-Blackwellised estimator 
of 9 is defined through 



whose variance is less than Var92- 

Consider fc = 2 in gui „{Vi) and let {Yi,Y2) be distributed according to 
Qui niVi)- Replications of (^1,12) induce an estimator of 9 jib 2 for fixed 
Iterating on the simulation of the runs X" produces, for n — 100 an i.i.d. sample 
of 9iiB,2's and the Var9RB.2 is estimated. The resulting variance shows a net 
improvement with respect to the estimated variance of ^2 ■ It is of some interest 
to confront this gain in variance as the number of terms involved in 9k increases 
together with fc. As fc approaches n the variance of 9k approaches the Cramer 
Rao bound. The graph below shows the decay in variance of 9k- We note that 
whatever the value of k the estimated value of the variance of 9jiB,k is constant. 
This is indeed an illustration of Lehmann-Scheffe's theorem. 

Remark 4 Lockhart and O'Reilly (2005) '19'l establish, under certain condi- 
tions and for fixed fc, the asymptotic equivalence of the plug-in estimate for the 
distribution P^Mt (-^i ^ ^) '^'^'^ Rao-Blackwell estimate P (Xj" e B\ Ui^„) 
where 9ml is the maximum likelihood estimator of 9t based on the whole sample 
X" (this result is known as Moore's conjecture (see Moore( 1973) 123] )). They 
also provide rates for this convergence. 



4 Exponential models with nuisance parameters 

4.1 Conditional inference in exponential families 

We consider the case when the parameter consists in two distinct subparameters, 
one of interest denoted 9 and a nuisance component denoted "q. As is well known, 
conditioning on a sufficient statistics for the nuisance parameter produces a 
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Figure 5: Variance of 9k, the initial estimator (dotted line), along with the 
variance of 9RB,k, the Rao-Blackwellised estimator (solid line) with n — 100 as 
a function ok k. 



new exponential family which is free of it. Assuming the observed dataset 
x" :— (xi,...,x„) resulting from sampling of a vector X" := (Xi,...,X„) of 
i.i.d. random variables with distribution in the initial exponential model, and 
denoting m(x") a sufficient statistics for 77, simulation of samples under the 
conditional distribution of X" given u (X") — u (x") and 9 = 60 for some 9o 
produces the basic ingredient for Monte Carlo tests with Hq : 9t — 9q where 
9t stands for the true value of the parameter of interest. Changing 0o for other 
values of the parameter of interest produces power curves as functions of the 
level of the test. This is the well known principle of Monte Carlo tests, and such 
is the goal of the present section. We consider a steep but not necessarily regular 
exponential family exponential family V :— {Pe,ri, {9, rj) € N} defined on M with 
canonical parametrization {9, rj) and minimal sufficient statistics {t, u) defined 
on M through the density 



PeA^) ' = exp [9t{x) + rju{x) - K{9, r/)] h{x). (9) 

For notational conveniency and without loss of generality both 9 and 77 belong 
to M. Also the model can be defined on M.'^, d > 1, at the cost of similar but 
more envolved tools. The natural parameter space is Af (which is a convex set 
in M^) defined as the effective domain of 

.(.,„).^exp|A-(M)l^/»p[''«W + ,.(.)|/.Wd.^ (10) 

Let A"" := (Ai, A„) be n i.i.d. replications of a general random variable 
X with density ©. Denote 

n n 

Ti,„:=^i(A,) and C/i,„ := ^ w(A,). (11) 

1=1 i=l 



11 



Basu (1977) [3] discusses ten different ways for eliminating the nuisance pa- 
rameters, among which conditioning on sufficient statistics and consider UMPU 
tests pertaining to the parameter of interest. In most cases, the density of Ti^n 
given Ui^n = ui_„ is unknown. Two main ways have been developped to deal 
with this issue: approximating this conditional density of a statistics or simu- 
lating samples from the conditional density. We compose these two approaches 
in the present paper. 

The classical technique is to approximate this conditional density using some 
expansion. Then integration produces critical values. For example, Pedersen 
(1979) [21] states the mixed Edgeworth-saddlepoint approximation, or the single 
saddlepoint approximation. However, the main issue of this technique is that 
the approximated density still depends on the nuisance parameter. In order to 
obtain the expansion, some suitable values for the parameter of interest and for 
the nuisance parameter have to be chosen. In the method developped here, as 
seen before, the conditional approximated density inherits of the invariance with 
respect to the nuisance parameter when conditioning on a sufficient statistics 
pertaining to this parameter. 

Rephrasing the notation of Section 2 in the present setting it holds that the 
MLE {6ml, Vml) satisfies 



OK (0,7,) 



= ui,n/n 

0ml ,Vml 



d-q 

and therefore ui,n/n converges to { ^^^q^'^^ ^ {vt) ■ 

For notational clearness denote /i the expectation of u (Xi) and cr^ its vari- 
ance under {Ott^t) j hence 

M M(eT..)T) — dKiOr, riT)/dri := (rfg^^rj^) d^K{eT, VT)/dr]'^ 

Assume at present 9t and tjt known. It holds 

(/)(r) E(^e-r..nT) exp[ru (X)] exp [K{0t, + r) ~ K{eT, Vt)] 



and 



Further 



TT, 



m-ir) = li{eT.riT+r) 



, expru(a:) , , , , , , 

Tr\ V{BT,m) \^) = V(BT,r,T+r) \X) (12) 



for any given a in the range of Pg^^riT- the above formula (|12p the parameter 
r denotes the only solution of the equation 

m(r) — a. 

For large k depending on n, using Monte Carlo tests based on runs of length k 
instead of n does not affect the accuracy of the results. 
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4.2 Application of conditional sampling to MC tests 

Consider a test defined through Hq ■ 9t = Oq versus Hi : Ot ^ Monte 
Carlo (MC) tests aim at obtaining p— values through simulation, where the 
distribution of the desired test statistics under is either unknown or very 
cumbersome to obtain; a comprehensive reference is J6ckel(1986), |15| . 

Recall the principle of thoses tests: denote t the observed value of the studied 
statistic based on the dataset and let ^2, --i^l the values of the resulting test 
statistics obtained through the simulation of L — 1 samples X" under Hq. If t 
is the Mth largest value of the sample (t, ^2, ^l), Hq will be rejected at the 
a = Mjh signifiance level, since the rank of t is uniformly distributed on the 
integer 2, L when iJg holds. Calculation of power functions can be handled 
similarly. The above approximation of the conditional density (xf) involves 
the unknown parameters Qt and r\T in all the simulation steps. This problem 
is solved when simulating under : 9t = Oq setting Oq in place of 9t and rjeg 
in place of r]T, where fig^ is the MLE of tit in the one parameter family peo,ri 
defined through This choice follows the commonly used one, as advocated 
for instance in |24) and |25) . Innumerous simulation studies support this choice 
in various contexts. 

Consider the problem of testing the null hypothesis Hq : 9t — do against the 
alternative Hi : 9t > do in model (jlO[) where rj is the nuisance parameter. 

When „.0o is known, the classical conditional test Hq : Ot = do versus 
Hi:9t> Oq with level a is UMPU. 

Substituting pg^ (X" = |Ui,„ = ui,„) by 5«i_„,eo (a^i) defined in ([6|), i.e. 
substituting the test statistics T" by T^f and pg^ (X^ = Xi |Ui_„ — Mi^„) by 
5mi,,i,9o (^i) i-S- changing the model for a proxy while keeping the same param- 
eter of interest 6 yields the conditional test with level a 

( 1 if Ti,fe > t„ 
[ if Ti,fc < to, 

and 

i.e. a := J ltifc>t„5ui„ (xi) dxi...dxk- Its power under a simple hypothesis 
6t — 9 is defined through 

= £'e[V'«(T'l,„, Ui,n)\Ul,n = Ul,7i]- 

Recall that the parametric bootstrap produces samples from a parametric 
model which is fitted to the data, often through maximum likelihood. In the 
present setting, the parameter 9 is set to 9o and the nuisance parameter 77 is 
replaced by its estimator rjgg which is the MLE of rj when the parameter 9 is fixed 
at the value 9o defining Ho- Comparing their exact conditional MC tests with 
parametric bootstrap ones for Gamma distributions, Lockhart et alf2007)[l9] 
conclude that no significant difference can be notices in terms of level or in 
terms of power. We proceed in the same vein, comparing conditional sampling 
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MC tests with the parametric bootstrap ones, obtaining again similar results 
when the nuisance parameter is estimated accurately. However the results are 
somehow different when the nuisance parameter cannot be estimated accurately, 
which may occiir in various cases. 

In practice since the chosen conditioning statistics is quasi sufficient for the 
nuisance parameter, we plug any value for this parameter in the definition of 
Qui^n - This is what has been performed in all examples below. 

4.3 Unimodal Likelihood: testing the coefficients of a Gamma 
distribution 

Let X" be an i.i.d. sample of random variables with Gamma distribution 
T{aT,bT) where gt is the shape coefficient and hx is the scale coefficient. 
As a and b vary this distribution is a two parameter exponential family. The 
statistics Ti^„ := Xi + ... + X„ is sufficient for the parameter a and J7i,„ := 
logXi + ... + logX„ is sufficient for b. 

MC conditional test with Hq : gt = ao Denote ui,„ = Y^^=i -^i ^^'^ ^ao 
the MLE of b. Calculate for I e {2, L] 

k 

ti :=5^iog(y,(0). 

i=0 

where the y/ are a sample from 

Consider the corresponding parametric bootstrap procedure for the same 
test, namely simulate Zi{l), 2 < I < L and < i < k with distribution 

r (ao,bao^ ; denote 

fc 

si :=^log(Zi(0). 

i=0 

In this example simulation shows that for any a the Mth largest value of the 
sample {t,t2, ■■■,tL) is very close to the corresponding empirical M/L-quantile 
of s;'s. Hence Monte Carlo tests through parametric bootstrap and conditional 
compete equally. Also in terms of power, irrespectively in terms of a and in 
terms of alternatives (close to HO), the two methods seem to be equivalent. 

MC conditional test with Hq : &t = &o Denote = Y^^=i log i-^i) ^^'^ 
flbo the MLE of a. Calculate for I G {2, L} 

k 

ti ■.= Y.YS) 

i=0 



14 



where the Yl are a sample from gui and, as above define accordingly 

k 

SI :=^log(Z,(0) 

1=0 

where the Zi{iys are simulated mider T (Sbo, bo) . 

As above, parametric bootstrap and conditional sampling yield equivalent 
Monte Carlo tests in terms of power function under alternatives close to HO. 

In the two cases studied above the value of k has been obtained through 
the rule exposed in section 3.2 of Broniatowski and Caron (2011) 4 . 

4.3.1 Bimodal likelihood: testing the mean of a normal distribution 
in dimension 2 

In contrast with the above mentionned examples, the following case study shows 
that estimation through the unconditional likelihood may fail to provide con- 
sistent estimators when the likelihood surface has multiple critical points. This 
in turn yields parametric bootstrap Monte Carlo tests with inacceptable power 
functions. 

Sundberg(2009) [301 proposes four examples that allow likelihood multimodal- 
ity. Two of them can also be found in [9^ and [1^, and in [2], Ch 2. We consider 
the "Normal parabola" model which is a curved (2, 1) family (see Example 2.35 
in [2], Ch 2 ). Two independent Gaussian variates have unknown means and 
known variances; their means are related by a parabolic relationship. 

Let X et Y he two independent gaussian r.v.'s with same variance with 
expectation ipx and ip'^. In the present example = 1 and ipT = 2. 

Let {Xi,Yi) , 1 < i < n be an i.i.d. sample with the above distribution. 

The parameter of interest is whislt the nuisance parameters is ip- Deriva- 
tion of the likelihood function of the observed sample with respect to ip yields 
the following equation 

(Ul^n - V') + 2^ (Fi,„ - = 

with Ui^n ■— Xi + ... -I- Xn and Vi^„ :— Yi + ... + y„. The following table shows 
that the likelihood function is bimodal in -0. 

Estimation of the nuisance parameter il) is performed through the standard 
Newton Raphson method. The Newton-Raphson optimizer of the likelihood 
function converges to the true value when the initial value is larger than land 
fails to converge to ijjT — 2 otherwise. Hencefore the parametric bootstrap 
estimation of the likelihood function of the sample based on this preliminary 
estimate of the nuisance parameter may lead to erroneous estimates of the pa- 
rameter of interest. Indeed according to the initial value we obtained estimators 
of V'T close to 2 or to —2. When the estimator of the nuisance parameter is close 
to its true value 2 then parametric bootstrap yields Monte Carlo tests with power 
close to 1 for any a and any alternative close to HQ. At the contrary when this 
estimate is close to the second maximizer of the likelihood (i.e. close to —2) then 
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Figure 6: Bimodal likelihood in ■(/;. 



the resulting Monte Carlo test based on parametric bootstrap has power close 
to irrespectively of the value of a and of the alternative, when close to HO. 
In contrast with these results, Monte Carlo tests based on conditional sampling 
provide powers close to 1 for any a; we have considered alternatives close to 
HO . This result is of course a consequence of quasi sufficiency of the statistics 
{Ui^n, Vi,n) for the parameter tp of the distribution of the sample (-'^i, 5^i)i=i ni 
see next paragraph for a discussion of this point. 

4.4 Estimation through conditional hkeUhood 

Considering model pop we intend to perform an estimation of 6t irrespectively 
upon the value of Denote rfg the MLE of 7/t when 9 holds; the model pg,fig {x) 
is a one parameter model which is fitted to the data for any peculiar choice of 
9. The classic unconditional likelihood provides consistent estimators of 9t in 
many cases. However, this method strongly relies on the constistency properties 
of f]g at any given 9. 

For fixed parameter value 9 of the parameter of interest. Theorem 1 means 
that the likelihood of the subsample with unknown distribution with pa- 
rameter (^Tj^yr) under the distribution with any parameter 9 given the value 
of the sufficient statistics Ui^n is approximated by gm „ (^i) when X^^ is either 
generated under the conditional density or under g^^ „ with parameter rj = rjT- 
Substituting rjT by its estimator should yield maximal value of the approximate 
likelihood when {9t, tit) holds, since rjg approaches rjT when 9 — 9t- In partic- 
ular, this holds when Xj^ is generated under 9t: tit which holds on the observed 
sample. This yields to an algorithm to estimate 9t- For any 9 calculate rje . 
Evaluate g^ „ (^i) optimize in 9. 

In most cases, as the normal, gamma or inverse-gaussian, both estimation 
through the unconditional likelihood and estimation through conditional likeli- 
hood based on the proxy g^^ ^ give a consistent estimator. 

We consider the example of the Bimodal likelihood from the above subsec- 
tion, inheriting of the notation and explore the behaviour of the proxy of the 
conditional likelihood of the sample {Xi,Yi) , 1 < i < n when conditioning on 
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Figure 7: Proxy of the conditional likelihood (solid line) along with the empirical 
likelihood (dotted line) as function of cr^ for n — 100 and fc = 99 in the case 
where a good initial point in Newton-Raphson procedure is chosen. 

Ui^n and Vi^n , as a function of a^. The likelihood writes 
L{a^\ {X,,Yi),l< i < n,Ui,n,Vi^n) 

where we have used the independence of the r.v.'s Xi's and F^'s. 

Applying Theorem 1 to the above expression it appears that ip cancels in the 
resulting density (7«i „ and g-u^ „■ This proves that the proxy of the conditional 
likelihood provides consistent estimation of cr^ as shown on Figures [7] and |8] (see 
the solid lines). 

On Figure [71 the dot line is the empirical likelihood function with consistent 
estimator of the nuisance parameter; the resulting maximizer in the variable 
cr^ is close to CTy = 1. At the opposite in Figure |8] an inconsistent preliminary 
estimator of ipT obtained through a bad tuning of the initial point in the Newton- 
Raphson procedure leads to unconsistency in the estimation of a^, the resulting 
likelihood function being unbounded. 
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